31 research outputs found
Towards computationally efficient neural networks with adaptive and dynamic computations
Ces dernières années, l'intelligence artificielle a été considérablement avancée et l'apprentissage en profondeur, où des réseaux de neurones profonds sont utilisés pour tenter d'imiter vaguement le cerveau humain, y a contribué de manière significative. Les réseaux de neurones profonds sont désormais capables d'obtenir un grand succès sur la base d'une grande quantité de données et de ressources de calcul suffisantes. Malgré leur succès, leur capacité à s'adapter rapidement à de nouveaux concepts, tâches et environnements est assez limitée voire inexistante. Dans cette thèse, nous nous intéressons à la façon dont les réseaux de neurones profonds peuvent s'adapter à des circonstances en constante évolution ou totalement nouvelles, de la même manière que l'intelligence humaine, et introduisons en outre des modules architecturaux adaptatifs et dynamiques ou des cadres de méta-apprentissage pour que cela se produise de manière efficace sur le plan informatique.
Cette thèse consiste en une série d'études proposant des méthodes pour utiliser des calculs adaptatifs et dynamiques pour aborder les problèmes d'adaptation qui sont étudiés sous différentes perspectives telles que les adaptations au niveau de la tâche, au niveau temporel et au niveau du contexte.
Dans le premier article, nous nous concentrons sur l'adaptation rapide des tâches basée sur un cadre de méta-apprentissage.
Plus précisément, nous étudions l'incertitude du modèle induite par l'adaptation rapide à une nouvelle tâche avec quelques exemples. Ce problème est atténué en combinant un méta-apprentissage efficace basé sur des gradients avec une inférence variationnelle non paramétrique dans un cadre probabiliste fondé sur des principes. C'est une étape importante vers un méta-apprentissage robuste que nous développons une méthode d'apprentissage bayésienne à quelques exemples pour éviter le surapprentissage au niveau des tâches.
Dans le deuxième article, nous essayons d'améliorer les performances de la prédiction de la séquence (c'est-à-dire du futur) en introduisant une prédiction du futur sauteur basée sur la taille du pas adaptatif. C'est une capacité critique pour un agent intelligent d'explorer un environnement qui permet un apprentissage efficace avec une imagination sauteur futur. Nous rendons cela possible en introduisant le modèle hiérarchique d'espace d'état récurrent (HRSSM) qui peut découvrir la structure temporelle latente (par exemple, les sous-séquences) tout en modélisant ses transitions d'état stochastiques de manière hiérarchique.
Enfin, dans le dernier article, nous étudions un cadre qui peut capturer le contexte global dans les données d'image de manière adaptative et traiter davantage les données en fonction de ces informations. Nous implémentons ce cadre en extrayant des concepts visuels de haut niveau à travers des modules d'attention et en utilisant un raisonnement basé sur des graphes pour en saisir le contexte global. De plus, des transformations au niveau des caractéristiques sont utilisées pour propager le contexte global à tous les descripteurs locaux de manière adaptative.Over the past few years, artificial intelligence has been greatly advanced, and deep learning, where deep neural networks are used to attempt to loosely emulate the human brain, has significantly contributed to it. Deep neural networks are now able to achieve great success based on a large amount of data and sufficient computational resources. Despite their success, their ability to quickly adapt to new concepts, tasks, and environments is quite limited or even non-existent. In this thesis, we are interested in how deep neural networks can become adaptive to continually changing or totally new circumstances, similarly to human intelligence, and further introduce adaptive and dynamic architectural modules or meta-learning frameworks to make it happen in computationally efficient ways. This thesis consists of a series of studies proposing methods to utilize adaptive and dynamic computations to tackle adaptation problems that are investigated from different perspectives such as task-level, temporal-level, and context-level adaptations.
In the first article, we focus on task-level fast adaptation based on a meta-learning framework.
More specifically, we investigate the inherent model uncertainty that is induced from quickly adapting to a new task with a few examples. This problem is alleviated by combining the efficient gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. It is an important step towards robust meta-learning that we develop a Bayesian few-shot learning method to prevent task-level overfitting.
In the second article, we attempt to improve the performance of sequence (i.e. future) prediction by introducing a jumpy future prediction that is based on the adaptive step size. It is a critical ability for an intelligent agent to explore an environment that enables efficient option-learning and jumpy future imagination. We make this possible by introducing the Hierarchical Recurrent State Space Model (HRSSM) that can discover the latent temporal structure (e.g. subsequences) while also modeling its stochastic state transitions hierarchically.
Finally, in the last article, we investigate a framework that can capture the global context in image data in an adaptive way and further process the data based on that information. We implement this framework by extracting high-level visual concepts through attention modules and using graph-based reasoning to capture the global context from them. In addition, feature-wise transformations are used to propagate the global context to all local descriptors in an adaptive way
Discrete denoising of heterogenous two-dimensional data
We consider discrete denoising of two-dimensional data with characteristics
that may be varying abruptly between regions.
Using a quadtree decomposition technique and space-filling curves, we extend
the recently developed S-DUDE (Shifting Discrete Universal DEnoiser), which was
tailored to one-dimensional data, to the two-dimensional case. Our scheme
competes with a genie that has access, in addition to the noisy data, also to
the underlying noiseless data, and can employ different two-dimensional
sliding window denoisers along distinct regions obtained by a quadtree
decomposition with leaves, in a way that minimizes the overall loss. We
show that, regardless of what the underlying noiseless data may be, the
two-dimensional S-DUDE performs essentially as well as this genie, provided
that the number of distinct regions satisfies , where is the total
size of the data. The resulting algorithm complexity is still linear in both
and , as in the one-dimensional case. Our experimental results show that
the two-dimensional S-DUDE can be effective when the characteristics of the
underlying clean image vary across different regions in the data.Comment: 16 pages, submitted to IEEE Transactions on Information Theor
Visual Concept Reasoning Networks
A split-transform-merge strategy has been broadly used as an architectural
constraint in convolutional neural networks for visual recognition tasks. It
approximates sparsely connected networks by explicitly defining multiple
branches to simultaneously learn representations with different visual concepts
or properties. Dependencies or interactions between these representations are
typically defined by dense and local operations, however, without any
adaptiveness or high-level reasoning. In this work, we propose to exploit this
strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to
enable reasoning between high-level visual concepts. We associate each branch
with a visual concept and derive a compact concept state by selecting a few
local descriptors through an attention module. These concept states are then
updated by graph-based interaction and used to adaptively modulate the local
descriptors. We describe our proposed model by
split-transform-attend-interact-modulate-merge stages, which are implemented by
opting for a highly modularized architecture. Extensive experiments on visual
recognition tasks such as image classification, semantic segmentation, object
detection, scene recognition, and action recognition show that our proposed
model, VCRNet, consistently improves the performance by increasing the number
of parameters by less than 1%.Comment: Preprin
Regularization and Kernelization of the Maximin Correlation Approach
Robust classification becomes challenging when each class consists of
multiple subclasses. Examples include multi-font optical character recognition
and automated protein function prediction. In correlation-based
nearest-neighbor classification, the maximin correlation approach (MCA)
provides the worst-case optimal solution by minimizing the maximum
misclassification risk through an iterative procedure. Despite the optimality,
the original MCA has drawbacks that have limited its wide applicability in
practice. That is, the MCA tends to be sensitive to outliers, cannot
effectively handle nonlinearities in datasets, and suffers from having high
computational complexity. To address these limitations, we propose an improved
solution, named regularized maximin correlation approach (R-MCA). We first
reformulate MCA as a quadratically constrained linear programming (QCLP)
problem, incorporate regularization by introducing slack variables in the
primal problem of the QCLP, and derive the corresponding Lagrangian dual. The
dual formulation enables us to apply the kernel trick to R-MCA so that it can
better handle nonlinearities. Our experimental results demonstrate that the
regularization and kernelization make the proposed R-MCA more robust and
accurate for various classification tasks than the original MCA. Furthermore,
when the data size or dimensionality grows, R-MCA runs substantially faster by
solving either the primal or dual (whichever has a smaller variable dimension)
of the QCLP.Comment: Submitted to IEEE Acces
Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning
Continual domain shift poses a significant challenge in real-world
applications, particularly in situations where labeled data is not available
for new domains. The challenge of acquiring knowledge in this problem setting
is referred to as unsupervised continual domain shift learning. Existing
methods for domain adaptation and generalization have limitations in addressing
this issue, as they focus either on adapting to a specific domain or
generalizing to unseen domains, but not both. In this paper, we propose
Complementary Domain Adaptation and Generalization (CoDAG), a simple yet
effective learning framework that combines domain adaptation and generalization
in a complementary manner to achieve three major goals of unsupervised
continual domain shift learning: adapting to a current domain, generalizing to
unseen domains, and preventing forgetting of previously seen domains. Our
approach is model-agnostic, meaning that it is compatible with any existing
domain adaptation and generalization algorithms. We evaluate CoDAG on several
benchmark datasets and demonstrate that our model outperforms state-of-the-art
models in all datasets and evaluation metrics, highlighting its effectiveness
and robustness in handling unsupervised continual domain shift learning
Meta-Learning with Adaptive Weighted Loss for Imbalanced Cold-Start Recommendation
Sequential recommenders have made great strides in capturing a user's
preferences. Nevertheless, the cold-start recommendation remains a fundamental
challenge as they typically involve limited user-item interactions for
personalization. Recently, gradient-based meta-learning approaches have emerged
in the sequential recommendation field due to their fast adaptation and
easy-to-integrate abilities. The meta-learning algorithms formulate the
cold-start recommendation as a few-shot learning problem, where each user is
represented as a task to be adapted. While meta-learning algorithms generally
assume that task-wise samples are evenly distributed over classes or values,
user-item interactions in real-world applications do not conform to such a
distribution (e.g., watching favorite videos multiple times, leaving only
positive ratings without any negative ones). Consequently, imbalanced user
feedback, which accounts for the majority of task training data, may dominate
the user adaptation process and prevent meta-learning algorithms from learning
meaningful meta-knowledge for personalized recommendations. To alleviate this
limitation, we propose a novel sequential recommendation framework based on
gradient-based meta-learning that captures the imbalanced rating distribution
of each user and computes adaptive loss for user-specific learning. Our work is
the first to tackle the impact of imbalanced ratings in cold-start sequential
recommendation scenarios. Through extensive experiments conducted on real-world
datasets, we demonstrate the effectiveness of our framework.Comment: Accepted by CIKM 202